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The search for gravitational-wave signals in detector data is often hampered by the fact that 
many data analysis methods are based on the theory of stationary Gaussian noise, while actual 
measurement data frequently exhibit clear departures from these assumptions. Deriving methods 
from models more closely reflecting the data's properties promises to yield more sensitive procedures. 
The commonly used nmtched filter is such a detection method that may be derived via a Gaussian 
model. In this paper we propose a generalized matched-filtering technique based on a Student-t 
distribution that is able to account for heavier-tailed noise and is robust against outliers in the data. 
On the technical side, it generalizes the matched filter's least-squares method to an iterative, or 
adaptive, variation. In a simplified Monte Carlo study we show that when applied to simulated 
signals buried in actual interferometer noise it leads to a higher detection rate than the usual 
("Gaussian") matched filter. 
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I. INTRODUCTION 

Since the existence of gravitational radiation was es- 
tablished as a consequence from general relativity the- 
ory, a great amount of effort has gone into the devel- 
opment of instruments and methods to detect gravita- 
tional waves directly [1, 2]. Gravitational waves (GWs) 
are notoriously weak compared to the sources of noise in 
today's ground-based gravitational-wave detectors, and 
so it takes both extraordinarily sensitive instruments as 
well as sophisticated data analysis techniques to measure 
them. The output of an interferometric GW detector is 
essentially a time series of nonwhite noise, and — poten- 
tially — a superimposed signal whose exact waveform is 
determined by several parameters. Data analysis aiming 
for GW detection hence requires filtering of time-series 
data for rare, weak signals that are often of a known, 
parametrized shape. Many commonly used approaches 
are based on matched Bltering the data. The matched 
filter may be derived as a maximum-likelihood (ML) de- 
tection method in the framework of a Gaussian noise 
model, but more generally will actually be ML procedure 
for a wider class of models. While the method works re- 
markably well and is able to discriminate weak signals 
from the noise, it commonly runs into problems due to 
non-Gaussian or nonstationary behavior of the actual in- 
strument noise. For example, the matched filter often 
is sensitive to outliers or loud transient noise events in 
the data, which, although showing little similarity with 
the signal sought for, also do not look like plain noise 
either. A lot of effort needs to go into identifying such 
false alarms. 

We propose a more robust procedure that is based on 
a Student-t distribution for the noise, as introduced in 
Ref . [3] . Several motivations may be used for introducing 
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the Student-t model; most obviously it exhibits "heav- 
ier tails" and non-spherical probability density contours, 
allowing one to accommodate outliers in the noise. Al- 
ternatively, the model may also be seen as incorporating 
imperfect prior knowledge of the noise spectrum, either 
because it is only estimated to limited accuracy, or be- 
cause it is varying over time. Models of this kind are 
commonly used for robust parameter estimation, but, as 
we will show in the following, the model also exhibits 
a better performance for detection purposes when the 
assumption of stationary Gaussian noise is violated. We 
expect the proposed filtering method to be useful in other 
signal-processing contexts as well. 

In Sec. II we will first derive the usual matched filter 
from a Gaussian noise model. In Sec. Ill we introduce 
the Student-i model, elaborate on the motivation for its 
use as well as point out the differences from the Gaussian 
model, and derive the analogous filtering procedure. In 
Sec. IV we report on a case study using real detector data 
and simulated signals to show that here the Student-t 
based filter indeed yields a better detection rate. We 
close with some concluding remarks. 

II. GAUSSIAN MATCHED-FILTERING 
A. General 

A matched Rlter may be derived in different ways, for 
example based on considerations of the residual sum-of- 
squares (or power) decomposition, without reference to a 
more specific noise model [4]; however, here we will con- 
centrate on a derivation via the assumption of station- 
ary Gaussian noise and the Whittle likelihood. This will 
allow us to easily generalize the usual matched-filtering 
method to the case of Student-t distributed noise in the 
following. It is important though to keep in mind that 
the matched filter is not necessarily connected to the as- 
sumption of Gaussian noise. When we say "Gaussian 
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matched filter", this is meant to refer to its derivation 
and interpretation in the Gaussian context. 

B. The Gaussian noise model 

In order to implement the assumption of stationary, 
Gaussian noise residuals, the Whittle likelihood approx- 
imation is commonly utilized [3, 5, 6]. In the Whittle 
approximation, signal and noise time series are treated 
in their Fourier-domain representation. The explicit as- 
sumption being made on the noise n{t) is that its dis- 
crete Fourier transform n(/) is independently Gaussian 
distributed with zero mean and variance proportional to 
the power spectral density (PSD), 

Var(Re(n(/,))) = Var(lm(n(/,))) = ^S,{f,), (1) 

where fj is the jth Fourier frequency, Si{fj) is the cor- 
responding one-sided power spectral density, and j = 
0, . . . ,N/2 indexes the Fourier frequency bins. An ex- 
plicit definition of the Fourier transform conventions used 
here is given in the appendix. 

For some measured data d{t) one then commonly as- 
sumes a parametrized signal sg{t) with parameter vec- 
tor 9 and additive Gaussian noise with a known 1-sided 
power spectral density Si{f): 

d{t)=se{t)+n{t) ^ d{f) = Se{f)+n{f) (2) 

(i.e., additivity holds in both time and Fourier domains). 
The corresponding likelihood function then is given by 

pH.)ocexp(^-,E ^s^if,) ) 

[3]. 

C. Likelihood maximization 

1. ML detection and the profile likelihood 

If there were no unknown signal parameters to the 
signal model (like time-of-arrival, amplitude, phase,...), 
then, according to the Neyman-Pearson lemma [7], the 
optimal detection statistic would be the likelihood ratio 
between the "signal" and "no-signal" models. Once there 
are unknowns in the signal model, a common approach is 
to use a generalized Neyman-Pearson test statistic^ that 
is, the maximized likelihood ratio, where maximization 
is carried out over the unknown parameters [7]. While 
this is in general not an optimal detection statistic, this 
ad hoc approach is often efficient and effective. Such a 
maximum likelihood (ML) detection approach is closely 
related to ML estimation, as either way the parameter 
values maximizing the likelihood will need to be derived. 
In case of a Gaussian noise model as in (3), maximization 



of the likelihood is equivalent to minimizing a weighted 
sum-of-squares, i.e., a weighted least-squares approach. 

It should be noted that in a Bayesian reasoning frame- 
work, the detection problem would be approached via 
the marginal likelihood rather than the maximized like- 
lihood [8, 9]. The marginal likelihood is the expectation 
of the likelihood function with respect to the prior dis- 
tribution, and both marginal and maximized likelihood 
may be equivalent for a certain choice of the prior dis- 
tribution. One can show the marginal likelihood to be 
optimal for any particular choice of prior distribution, 
while the maximized likelihood in general is not (see e.g. 
[10; 11]). Maximization of the likelihood on the other 
hand is commonly much easier computationally. 

As will be seen in the following, it is often convenient 
to divide the parameter vector into subsets, as it may 
be possible to analytically maximize the likelihood for 
fixed values of some parameters over the remaining pa- 
rameters. This maximized conditional likelihood as a 
function of a subset of parameters is also called the pro- 
file likelihood. If a profile likelihood is given, likelihood 
maximization may be reduced to maximizing over the 
remaining lower-dimensional parameter subspace. As an 
example, consider a signal having three free parameters: 
amplitude, phase, and time of arrival. If likelihood max- 
imization can be done analytically over amplitude and 
phase for any given arrival time, this results in a profile 
likelihood that is a function of time. The likelihood's 
overall maximum then may be computed via a numerical 
brute-force search of the profile likelihood over the time 
parameter. 



2. Why care about linear models? 

In signal processing in general, and in GW data anal- 
ysis in particular, the signals of interest are commonly 
parametrized (among other additional parameters) in 
terms of an amplitude and a phase parameter. Consider 
for example a simple sinusoidal signal of the form 

SA,4, J (t) = A sin{27r ft + (I)) (4) 
= ft sin(2^/t) + ft cos(27r/t) (5) 

which instead of amplitude A and phase (j) rnay cquiv- 
alcntly be parametrized in terms of sine- and cosine- 
amplitudes ft and ft. Other examples of signal mod- 
els given in terms of linear combinations are the singular 
value decomposition approach used e.g. in [12, 13] or the 
transformation of antennae pattern effects into four am- 
plitude parameters in the derivation of the F-statistic 
[14]. A linear model formulation will turn out conve- 
nient in the following, as a linear (or conditionally linear) 
model will allow us to perform (conditional) likelihood 
maximization analytically. 
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3. The general linear model 



4- The detection statistic and its distribution 



Consider a linear model for the data, i.e., 

y = Xp + e (6) 

where y is a A^-dimensional data vector, X is a {N x fc)- 
matrix, /? is a /c-dimensional parameter vector, and e is 
an A'^-dimensional vector of error terms. The errors e are 
assumed to be Gaussian distributed with mean zero and 
some covariance matrix S. 

In the above signal-processing context, y and e are the 
A'^-dimensional vectors of re-arranged real and imaginary 
parts of Fourier-domain data (d) and noise (n), the sig- 
nal sg is given by a linear combination of the columns 
of a matrix X according to the parameter vector (3, 
and the noise covariance S is a diagonal matrix defined 
through (1). 

The Gaussian likelihood function is characterized by 

p(y|/3)cx-(2/-X/3)'l]-i(2/-X/3). (7) 

In the linear model, the likelihood may be maximized 
analytically, and the ML estimator for the unknown pa- 
rameter vector /3 is given by 

/3 = {X'^-^X)-^X'^-^y (8) 

[8, 15]. 

In the models of concern here, estimation is simpli- 
fied by the fact that the noise covariance E is a diagonal 
matrix (1) so that its inverse again is diagonal. In ad- 
dition, here we add the common assumption that the 
vectors spanning the signal manifold, the columns of X, 
are orthogonal. A non-orthogonal basis X would com- 
plicate the procedure slightly; see e.g. [14]. Under these 
conditions, the pivotal quantities for ML estimation and 
detection are 



We define the statistic 
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(9) 



(10) 



i.e., the quadratic forms, or inner products, involving the 
jth basis vector (jth column of X) with the data vector y, 
and with itself. The elements of the parameter vector's 
ML estimate /3 are then given by 

= (11) 

the maximized likelihood ratio vs. the no-signal model is 
given by 



and the fitted values are given by 

k k 



y 



xp = ;^/3,x, = ;^^x^. 



(12) 



(13) 
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2xlogf^) 



(14) 



(see also (12)) which, under the null hypothesis of the 
data y being purely noise, is distributed with k de- 
grees of freedom. Under the signal hypothesis, when a 
signal = X/3* is present in the data, the correspond- 
ing figure evaluated at the true parameter values 



2 X log 



[ p{y\n 
V P(y|0) 



(15) 



will be Gaussian distributed with mean and variance 
4g^, where 



N 

E 



{I2j=l Pj^id ) 



N 

E 



(^r)j 



= (x^*)'s-i (X^*) 



(16) 
(17) 



is the true signal's signal-to-noise ratio (SNR). Conse- 
quently, for a signal of given SNR g"^, the expected log- 
arithmic likelihood ratio evaluated at the true parame- 



ters is E 



log 



while the likelihood ra- 



tio 



p(.y\l3*) ' 
pivlo) ' 

follows a log-normal distribution with median 

p(jy|/3* 



p(y|o) 



exp(£»^). The 



p(y|o) 

exp(ip^) and expectation E 

maximized likelihood ratio will be larger than that; the 
statistic Hk follows a noncentral x^(p^)-distribution with 
nonccntrality parameter g^, its expectation is g'^ -\- k , so 



that E 



log 



' p{y\0) ' 

p{y\0) . 



j=l 



3 = 1 



= ^(^g-^ + k). Note that the GW 

and signal-processing literature is sometimes confusing, 
as both g'^ and H^, or their square roots, are commonly 
referred to as the SNR. 

In common signal detection problems, the signal model 
is usually only partially linear, as suggested in Sec. II C 2, 
so that analytical maximization over the "linear" param- 
eters only yields a maximized conditional likelihood, or 
profile likelihood. The statistic Hk then is proportional 
to the profile likelihood, and (since the likelihood under 
the "noise only" null hypothesis, p(y|0), is a constant) 
constitutes a generalized Neyman-Pearson test statistic. 
This statistic, or its maximum over additional parame- 
ters, is commonly referred to as a detection statistic, as 
it is used to find the signal fitting the data best, and 
to determine its significance. The detection statistic's 
distributions under null and alternative hypotheses as 
stated above only apply for a single (conditional) likeli- 
hood maximization, i.e., for a given data set y and a given 
model matrix X. When maximizing the profile likelihood 
over additional parameters (or pieces of data), the test- 
ing problem turns into a multiple testing problem, and 
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the statistic's distribution will be an extreme value statis- 
tic [7, 16]. Since the particular statistic only comes 
up in the context of the Gaussian model, we will in the 
following be mostly referring to the more universal cor- 



responding likelihood ratio figure 



p(y|o) 



exp(ii7fc). 



D. Common implementation and terminology 

In the GW data analysis literature, likelihoods and 
matched filters arc commonly expressed in terms of the 
inner product (a, b) of real- valued functions (signal tem- 
plates or data) a and 5, technically defined in terms of 
analytical Fourier transforms. 



(a, 6) = 



Siif) 



df 



(18) 



[6, 14], which in practice is implemented (analogously 
to the Whittle likelihood) in terms of discrete Fourier 
transforms, 



(a, 6) 



(19) 



^ ^ [Rc(6(/,))Re(6(/,))+Im(a(/,))lm(6(/,)) 





3=0 



In terms of the linear models discussed in the previous 
section, this is equivalent to a quadratic form 



a'^-H (20) 



as in Eqs. (9), (10) above. Note that especially in the 
context of the Student-t model discussed below, expres- 
sion (18) may be hard to motivate, as it is continuous in 
frequency, but the corresponding discrete expression (19) 
may readily be related to expressions derived above. In 
this terminology, the signal-to-noise ratio of a signal sg 
(16) turns out as 



3 4At 



^ SlUj) 



2 {s0,se) 



(21) 



the correlation of some data d with a template se (as in 
(9)) simplifies to 



E 



[Re(J(/,))Re(Se(/,)) + Im(J(/,))lm(Se(/,)) 



4At' 



2(d,se), (22) 



the likelihood ratio of some signal template s for given 
data d is 



log I 



P{d\se) \ ^ ^3 s,if,) 
V p(d\0) / V l'^'^-^^)!' 

= 2 (d, sg) - (se, sg) 



(23) 
(24) 



[3, 6, 14], and the maximized likelihood ratio for a signal 
that is a linear combination of waveforms (d — /3jSj + 
n, see also (12)) then is 



(25) 



An implementation of a matched filter in the GW con- 
text is concisely described e.g. in [17, 18]. The signal 
searched for is a "chirping" binary inspiral waveform of 
increasing frequency and amplitude, which is character- 
ized by five parameters, namely two mass parameters 
determining the phase/amplitude evolution, and ampli- 
tude, phase and arrival time. The signal waveform s for 
given mass parameters i!) = (7711,7712) is (in analogy to 
(4)) given in terms of "sine" and "cosine" components 
Ss^^ and Sc,^, 

Si}{t) = l3sSs,^{t - tn) + l3cSc,i){t - to) (26) 

[17], where /3s and /3c are determined by the orbital phase 
and orientation of the binary system, and to defines the 
signal arrival time. The sine and cosine waveforms here 
constitute the signal manifold's orthogonal "basis vec- 
tors" . The actual matched-filter detection statistic is de- 
fined as p{to) = yX^(ti^)2~K^^(t^, where 



X'/^{to) oc 



dif){Ss/c.Af))* cxp(-2^i/^o) 
5,(1/1) 



df (27) 



[17], and where the exponential term does the time shift- 
ing of data and template against each other. For any 
given time shift to, this filter corresponds to (the square 
root of) the detection statistic Hk above (14). Comput- 
ing the matched filter (27) across time points to yields 
the profile likelihood, the conditional likelihood (condi- 
tional on time to and waveforms Sg^^, Sc,ij) maximized 
over phase and amplitude. The "overall" maximum like- 
lihood then is determined via a brute-force search over to 
and over additional alternative signal waveforms corre- 
sponding to different mass parameters d. Note that the 
search over arrival time to in (27) may be efficiently 
implemented via another Fourier transform [18]. The 
matched-filtering algorithm is also described in more de- 
tail in Appendix A3. 

In order to claim the detection of a signal, one needs 
to determine a threshold for the detection statistic (the 
maximized likelihood) , with respect to some pre-specificd 
false alarm rate. The detection statistic's distributions 
derived in Sec. II C 4 are likely not to be of much prac- 
tical relevance, due to common non-Gaussian or nonsta- 
tionary features in the data. Critical values for the de- 
tection statistic instead are commonly computed using 
bootstrapping methods (see e.g. [19, 20]). 
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FIG. 1. Density functions of Gaussian and Student-i distributions. Tiie left panel sliows univariate densities on tlie logarithmic 
scale. The right panel shows density contours of the joint distribution of two independent Gaussian random variables in contrast 
with two independent Student-t distributed variables of the same location {^) and scale (a). The two Student-f variables have 
differing degrees-of-freedom; the one corresponding to the x axis has u — 3, while the one along the y axis has f = 10. 



III. THE STUDENT-T FILTER 

A. The Student-t noise model 

The Student-i model for time series analysis was in- 
troduced in [3] as a generalization of the commonly 
used Gaussian model described in the previous section. 
The Studcnt-t distribution has an additional dcgrccs-of- 
freedom parameter, essentially controlling the distribu- 
tion's heavy-tailedness, i.e., the allowance for large out- 
liers. The Student-t likelihood function is given by 

p{d\0) 

[3]. According to this model, the residuals (Re(n(/j)), 
Im(n(/j))) within each Fourier frequency bin j follow a 
bivariatc Studcnt-t distribution [8] with location /i = 0, 

scale matrix E = (^^^[j^^ S (/))' '^'^8'"*^'^^"°^" 
freedom Vj > and implicit dimension 2. This implies 
that (i) residuals in different frequency bins are indepen- 
dent; (ii) residuals within the same bin are uncorrelated, 
but dependent; and (iii) the marginal distribution of each 
individual residual is a Student-t distribution with scale 
proportional to Si{fj) and dcgrces-of- freedom vj. De- 
creasing values of the degrees-of-freedom parameters i^j 
imply a heavier-tailed distribution, and in the limit of 

— > oo the model again reduces to the Gaussian model. 

Besides simply constituting a heavier-tailed noise 
model, the Student-t model arises as a generalization 
of the Gaussian model when the power spectral den- 
sity S{fj) is treated as uncertain, where the degrees-of- 



freedom parameter vj denotes the (prior) precision [3]. 
So the model is not only applicable in contexts where 
the noise itself is in fact t distributed, but also in cases 
where it is Gaussian, but the noise spectrum is a pri- 
ori only known to a certain accuracy. Alternatively, the 
same model would result when the noise spectrum itself 
was assumed to be randomly deviating from the scale 
parameter S'i(/), according to a distribution, e.g. be- 
cause it is only estimated with some uncertainty, which 
in fact resembles the original motivation for introducing 
Student's t-distribution in the context of the t-test and 
related procedures [7, 21]. Both randomness or uncer- 
tainty of the noise PSD technically lead to the same like- 
lihood expression here [3]. In general, the interpretation 
of the scale parameter 5*1 (/) in the contexts of the Gaus- 
sian and the Student-t model is not necessarily exactly 
the same. For the Gaussian model, it may be defined 
via the expected power Si{fj) = E[2^ while 
for the Student-t model this only holds in the limiting 
case of great certainty {ly — >■ oo). Within the Student-t 
model, the 5'(/) term specifies the scale of the uncer- 
tain PSD parameter and the expected power is in fact 
given by E[2^ = The choice of 

the degrees-of-freedom parameter i^j as well as the spec- 
trum parameter Si{fj) may be approached in different 
ways and may for the filtering purpose eventually be con- 
sidered a matter of tuning [3]. In the example in Sec. IV 
below, we simply kept the scale parameter Si{fj) to be 
the estimated noise spectrum as in the Gaussian case, and 
fitted a common degrees-of-freedom parameter i/j = i' for 
all frequency bins to the empirical data. 



B. Comparison to the Gaussian model 

When comparing to the Gaussian distribution, first of 
all the Student-t distribution exhibits heavier tails, i.e., 
the probability for obtaining "large" values (relative to 
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the distribution's scale) is much greater. While the den- 
sity functions are very similar within the range of ^± 2(T, 
where the bulk of probability is concentrated, the densi- 
ties' ratio will grow indefinitely toward the distributions' 
tails (see Fig. 1). The degrees-of-freedom parameter v 
controls the distribution's heavy-tailedness; a setting of 

V = 1 yields the "pathological" Cauchy distribution, for 

V > 2 the variance is finite, and in the limit of — > oo it 
again approaches the Gaussian distribution. 

Another discriminating feature is the shape of the den- 
sity contours. While a Gaussian density will always have 
elliptical contours, the Studcnt-t distribution is differ- 
ent in that its contours arc rather diamond-shaped, with 
elongations pointing along the principal axes (see Fig. 1). 
This way the Student-t model does not only allow for 
larger outliers, but it also considers outliers more likely 
to occur only in individual variables rather than jointly in 
all variables. Note that this latter effect follows from the 
fact the different frequency bins arc stochastically inde- 
pendent and not merely uncorrclatcd [22, 23]. Since the 
two (real and imaginary) residuals within each Fourier 
frequency bin follow a joint, bivariate, t distribution, 
the density contours within bins will still be spherical — 
otherwise a strange phase/amplitude dependence would 
be implied for the Fourier-domain model. The effect of 
independent Studcnt-t variables only comes to bear be- 
tween frequency bins. 

An important difference to note between the Gaussian 
and Studcnt-t model is that the least-squares fitting that 
results from the Gaussian model will actually be a ML 
procedure for any model within the wider class of "ellip- 
tically symmetric" models for the noise residuals (includ- 
ing e.g. a Student-i model with merely uncorrclatcd^ but 
not independent residuals) [22, 23]. The Student-i model 
described here hence advances into a fundamentally dif- 
ferent class of models. 

Student-i or similar models are commonly used in pa- 
rameter estimation contexts as robust alternatives to 
the Gaussian model that are less sensitive to outliers in 
the data [24-27]. Such models may be motivated in a 
"top-down" manner by the observation that the data do 
not actually fit the Gaussianity assumption, or also in 
a "bottom-up" way by pointing out that the resulting 
least-squares procedures are very sensitive to occasional 
outliers in the data. In the spirit of the latter view- 
point, the concept of M estimation was introduced, which 
aims at "fixing" outlier-sensitive least-squares procedures 
by replacing them by more robust statistics correspond- 
ing to more favorable influence functions [28, 29]. Sim- 
ilar approaches, namely down-weighting or ignorance of 
outliers in the data, have been proposed in the context 
of gravitational- wave detection before [30, 31], and the 
Student-t assumption may in fact be considered a spe- 
cial case of M estimation [26, 27]. 

Another fix that is commonly applied in GW data anal- 
ysis is the veto [32] , which is a figure computed along 
with a detection statistic that is supposed to discriminate 
actual signals from noise bursts. Such noise events may 



show little similarity with the signal template, but may 
often, due to non-negligible correlation with the template 
and very large power, still seem to indicate the presence 
of a signal. The veto then essentially checks for excess 
power that is inconsistent with the shape of the signals 
aimed for and that way will rule out such alleged detec- 
tions. The consideration of excess residual power is also 
implicitly happening in the Student-t model. From the 
different likelihood formulations ((3), (29)) one can write 
down the corresponding likelihood ratios for some data d 
and a signal template sg, 



log 



p{d\9, Gauss) \ 
p{d\d, Gauss) / 



El 



\d{f,) - Sgif,)\ 



(30) 



/ p(d|6l, Student) \ 
°^ Vp(d|0, Student) j 



j 



log 



(31) 



In both of the above cases the likelihood ratio is a func- 

\d{fj " 



tion of the ' 



power 



'data power" 



and the "residual 



i.e., the data's normalized sum- 



of-squarcs in each frequency bin j before and after sub- 
tracting the signal sg. For the Gaussian case, a "data 
power" of 10 and a "residual power" of 1 in the jth 
bin would have the same effect on the likelihood ratio 
as if the numbers were, say, 1010 and 1001 instead; the 
only relevant figure is their difference. In the Student-i 
model, the latter case would lead to a lower likelihood 
ratio; here not only the amount by which the signal sg is 
able to reduce the sum-of-squares is relevant, but so is its 
magnitude relative to the remaining residual term. The 
additional feature of the ML fit that is intrinsically con- 
sidered in the Student-i likelihood ratio (31) is essentially 
the corresponding cocfhcicnt of determination (i?^) [15]. 
As will become obvious in the following, when the actual 
implementation is described, the generalization to the 
Student-t model will on the technical side essentially re- 
place the least-squares procedure by an adaptive version. 
The adaptation step again ensures that excess residual 
noise power will downwcight the supposed significance of 
a signal. 



C. Likelihood maximization: the EM-algorithm 

While likelihood maximization in the Gaussian model 
boils down to least-squares fitting, the maximization step 
is not quite as simple for the Student-t model. However, 
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due to the structure of the problem, the expectation- 
maximization (EM) algorithm may be used to efficiently 
maximize the likelihood function [8, 33]. In order to ap- 
ply the EM algorithm, the likelihood expression needs to 
be reformulated. The Student-i likelihood may be viewed 
as a marginal likelihood, averaging out a set of unknown 
variance parameters c?^ [3] . Each of the variance param- 
eters ffj then corresponds to the power spectral density 
at the jth Fourier frequency bin. The EM algorithm's 
details as applied to the present problem are derived in 
detail in Appendix A 2 below. It turns out that maxi- 
mization of the Student-t likelihood may be done in an 
iterative manner, where each iteration again requires a 
weighted least-squares fit as in the Gaussian matched fil- 
ter. The EM algorithm requires a starting value Oq for 
the signal parameters. Given Oq, the expression 





\d{fj)-Ss{fj) 


1 2 




d{fj)-Seo{f,}\ 





(32) 

is maximized with respect to the parameter vector 6. The 
parameter value maximizing the above expression then 
constitutes the new Oq value, for which the expression 
again is maximized, and so forth. The resulting sequence 
of parameter values then converges to the maximum like- 
lihood estimate 9 [8]. 

Maximizing the above expression (32) again amounts 
to a weighted least-squares fit, exactly as in the case 
of the Gaussian matched filter (see also the correspond- 
ing likelihood expression (3)). The Student-i filter will 
therefore generalize the Gaussian matched filter by re- 
placing the least-squares procedure by an iterative, or 
adaptive, least-squares fit. Note that the denominator 
in (32) simply is a weighted average of the noise spec- 
trum (as in (3)) and the previous iteration's residual noise 
power, where the degrees-of- freedom parameter h' defines 
the relative weighting. Instead of the "plain" weighted 
least-squares match that is done in the Gaussian filter, 
the EM-iterations adapt the weights (the denominator 
in (32), which in the Gaussian model was the a priori 
known, fixed noise spectrum) to the residual noise power 
as found in the data, and the level of adaptation is reg- 
ulated by the degrees-of-freedom parameter v. 

The (ML) detection statistic does not follow a simple 
distributional form as in the Gaussian model, but in the 
example below one can already see that both statistics 
still behave similarly. The generalized likelihood ratio 
statistic will, by Wilks' theorem, in fact still approxi- 
mately follow a distribution [7, 34]. 

D. The filter implementation 

As for the Gaussian matched filter, the aim again is 
to maximize the likelihood (29), i.e., find best- fitting pa- 
rameter values 6 in parameter space. Again, it is ad- 
vantageous if the signal model can (at least partly) be 
formulated as a linear model. 



There are two obvious points in the matched-filtering 
procedure at which one could insert the EM-iterations 
in order to generalize it to the Student-i case: either at 
the level of each (originally analytical) maximization over 
linear model coefficients (usually corresponding to ampli- 
tude and phase), or at a higher level, iterating over linear 
coefficients as well as the signal arrival time parameter. It 
is not obvious whether one implementation is more sen- 
sitive than the other, but there definitely are differences 
in the implied computational costs. Both approaches are 
described and discussed in more detail in Appendix A3. 
An implementation of the latter algorithm, together with 
the analogous matched filter, is available in [35]. In case 
of a brute-force search over additional signal parameters 
(i.e., a "template bank"), one could in fact consider mov- 
ing the EM-lcvel yet another stage higher. 

As a starting parameter value (^o) for the algorithm, 
one could, for example, use the null vector or an ini- 
tial least-squares fit. As a stopping criterion, one could 
terminate the algorithm once the improvement in loga- 
rithmic likelihood from the previous iteration falls below 
some threshold, or when some maximum number of iter- 
ations is reached. Note that — unlike for the Gaussian lin- 
ear least-squares fit — the (conditional) likelihood might 
actually be multimodal [36], so that different starting 
values might lead to different results. It is not obvious 
whether this occurs frequently in practice, or rather re- 
quires particularly rare pathological circumstances; how- 
ever, it does not appear to pose a problem in the example 
below. 



IV. FILTERING EXPERIMENT ON ACTUAL 
DATA 

A. General 

Besides any theoretical or heuristic arguments why a 
Student-i based filter may improve detection, the figure 
of eventual relevance is going to be the resulting improve- 
ment in detection efficiency when applied to actual data 
— keeping in mind the additional complication and com- 
putational cost. In the following, we will demonstrate the 
filter's performance in a minimalistic, yet realistic toy 
problem. To that end, we will set up a filter for a certain 
kind of parametrized signal, and then test it against a 
conventional matched filter using injections of simulated 
signals. For the additive noise, we will use both simu- 
lated Gaussian noise as well as actual gravitational- wave 
detector instrument noise. Detection efficiency is going 
to be measured via the receiver operating characteristic 
(ROC) curve, allowing one to compare detection proba- 
bilities for given false alarm probabilities, or vice versa. 

In order to make the example realistic, we require a 
nontrivial signal waveform to be searched for; in par- 
ticular the waveform should not be monochromatic, but 
should instead span a wider range of Fourier frequencies. 
There should be parameters to be maximized over ana- 



8 




5 10 20 5 10 20 

theoretical quantile theoretical quantiie 



FIG. 2. Quantile-quantile plots (Q-Q plots) of the empirically found normalized residual noise power (33) versus its theoretical 
values assuming Gaussian and Student-f models. The marks indicate particular quantiles corresponding to powers of 10 in tail 
probability. The 10 largest empirical samples are shown as individual dots; the remaining quantiles are connected by a line. 



lytically as well as numerically, and wc should use noise 
that is non-Gaussian or nonstationary. The example de- 
scribed in the following mimics the setup of a search for 
binary inspiral signals in interferometric gravitational- 
wave detector data (see e.g. [17]). The noise data are 
taken from an actual detector, and, for comparison, a 
second data set of simulated, Gaussian noise of a realis- 
tic noise spectrum is used in parallel. The "search" being 
performed however is much simplified and not intended 
to be exhaustive or to span an astrophysically sensible 
parameter range. 



B. The data 

The data used in the following examples are going to 
be either simulated Gaussian noise with a power spec- 
tral density corresponding to LIGO's initial design sen- 
sitivity [37], or real instrument noise from LIGO's Liv- 
ingston interferometer, taken during LIGO's fifth science 
run ("S5") in late 2005 [38]. The data will be considered 
in chunks of 8 seconds length, downsampled to a sampling 
rate of 1024 Hz, and windowed using a Tukey window ta- 
pering 10% of the data (5% at each end). The noise's 
power spectral density 5*1 (/) is estimated essentially us- 
ing Welch's method [39], by considering the empirical 
power in the 32 preceding data segments, and taking the 
median as a robust estimator. The figures shown in the 
following are each based on 100 000 such data chunks. 

The signal waveform searched for here is taken to be a 
binary inspiral waveform approximated to the 2.0 post- 
Newtonian order [40] . The same waveform family is used 
for both injections as well as in the detection stage, and it 
has five free parameters: chirp mass (mc), mass ratio (77), 
coalescence time (tc), coalescence phase {(pc), and ampli- 
tude (A). The signal waveforms injected into the data 
were all done at the same mass parameters (toc — 4.5, 
77 = 0.25), and the amplitude is set such that the sig- 
nal's SNR (as computed based on the current PSD es- 



timate) is g = 
log(10'^) 



sJq^ = 5.257 so that E 
and E 



log 



exp ( Q 



V p(y\0) ' 
= 1012 



(see 



pfal/3*) 

p(y|o) 

also Sec. II C 4). Each 8-second chunk of data is eventu- 
ally analyzed twice, with and without a signal injection. 



C. Setting the degrees-of-freedom parameter 

In order to determine a suitable degrees-of-freedom pa- 
rameter V for the Student-i model, we considered the tail 
behavior of the noise. If the Gaussian (Whittle) model 
was accurate, then the normalized Fourier-domain noise 
power at the jth frequency bin. 



4f7'S'i(/i) 



(33) 



being the square root of the sum of two independent stan- 
dard Gaussian random variables (see Sec. II B), should 
follow a Rayleigh distribution. The residuals' normaliza- 
tion here is done — in analogy to the computations done 
in an actual search — via the estimated noise spectrum, 
as described in the previous subsection. We are only con- 
sidering the binned noise power here (and not the individ- 
ual real and imaginary components) as this is the relevant 
figure entering both the Gaussian as well as the Student-i 
likelihoods ((3), (29), (30), (31)). Under the Student-t 
model, instead of being Rayleigh distributed, the power 
(33) would instead follow a similar, more heavy-tailed 
distribution. We will refer to the Studcnt-t power's distri- 
bution as the Student-Rayleigti^ distribution here; more 
details on this distribution's particular form are given in 
Appendix A 4. 

We investigated the empirical distribution of actual 
noise residuals, for both simulated and actual instrumen- 
tal data. For the simulated data, this will account for 
effects of finite sample size, windowing and PSD estima- 
tion, and for actual data it will in addition give some in- 
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FIG. 3. Detection statistics (maximized likelihood ratios) based on Gaussian and Student-f models for simulated Gaussian 
data (left panel) and actual interferometer noise (right panel). Injections were of SNR g — 5.257. 



sight into the effects of realistic nonstationaritics or non- 
Gaussianitics in actual measurement noise. The noise 
samples are based on the residuals from 200 eight-second 
noise realizations of either simulated Gaussian noise, or 
actual instrument noise from LIGO's Livingston inter- 
ferometer. The residuals (33) are each normalized via 
a PSD estimate from 32 preceding noise samples, as de- 
scribed in the previous section, yielding a total of 800 000 
residuals. The data used here did not overlap with the 
data used in the following detection experiment. 

Figure 2 shows quantile-quantile plots (Q-Q plots) il- 
lustrating how well the models fit the actual data. The 
axes indicate theoretical (Rayleigh or Student-Rayleigh) 
quantiles, and the empirical quantiles as found in the 
data. If a model fits the data well, both theoretical and 
empirical quantiles should coincide, so that the quantiles 
follow a straight, diagonal line. A mismatch between 
model and data results in a differently shaped curve; in 
particular, if the data are more heavy-tailed than pre- 
dicted by the model, the curve will show an upward bend 
[41]. 

One can see that the actual data exhibit heavier tails 
in both cases of simulated, Gaussian noise as well as the 
instrument noise. In the case of Gaussian noise this is due 
to the estimation uncertainty in the noise spectrum. If 
we had been using the mean instead of the median to esti- 
mate the noise PSD, then the distribution of normalized 
noise residuals should be exactly Student-t with degrees 
of freedom equal to twice the number of noise samples 
averaged over (here, 32x2 = 64) [7, 21]. For the median 
estimation method, this is only approximately true, but 
apparently still roughly accurate; a maximum-likelihood 
fit for 1/ suggests a value of « 40 here. For the case 
of Gaussian data, the mismatch between assumed and 
observed quantiles is minimal anyway. 

For the real interferometer noise, the discrepancy be- 
tween Gaussian model and actual data is more dramatic; 
in the distribution's tails, the empirical quantiles are sig- 
nificantly larger than the assumed quantiles. For exam- 
ple, according to the Gaussian model, 99.99 % of the sam- 



ples should be < 4.3, while empirically the 99.99 % quan- 
tile lies at 8.1 for actual instrument noise (see the right 
panel of Fig. 2). A Student-t model seems to fit the data 
better, especially in the distributions' tails, although dis- 
crepancies in the extreme outliers are still large. Try- 
ing to estimate the degrees-of-freedom parameter v from 
different subsets of the empirical data yields ML esti- 
mates roughly in the range from 5 to 50; in the following 
we simply fixed the parameter at = 10 for the sim- 
ulations involving actual data. A value of > 40 would 
not seem to make sense here (even if the data were per- 
fectly Gaussian) and in the simulation results below we 
found that detection performance seemed to depend only 
weakly on v as long as it was roughly in the range 5-20. 
While the Student-t distribution does not fit the data 
perfectly, it seems to fit better than the Gaussian model. 
Instead of only fitting the degrees-of-freedom parameter, 
one could actually in addition also adapt the t distribu- 
tion's scale to the data (see also Sec. Ill A, or [3]). 



D. Filtering setup 

For each piece of data, the likelihood ratio is maxi- 
mized over phase and amplitude for given combinations 
of time and mass parameter values, where the evaluated 
time points were & {6.50, 6.55, . . . , 7.50} and the con- 
sidered masses were 77 = 0.25, rric € {3.0, 3.1, . . . , 6.0}. 
The injected signal's parameter values always were 
among the grid points maximized over, so that sig- 
nal/template mismatch considerations are not of concern 
here. On the technical side, this is implemented in a 
loop over template waveforms (corresponding to different 
mass parameters) and time points. At each mass/time 
combination, computation of the conditionally maxi- 
mized Gaussian likelihood ratio amounts to computing 
an inner product / quadratic form (see Sec. 11 C), while 
maximizing the conditional Student-t likelihood requires 
iterating over several such least-squares fits within the 
EM algorithm (see Sec. IIIC). The EM iterations were 
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FIG. 4. ROC curves for the Gaussian and the Student-t detection statistics in both data scenarios. The shaded area marks 
the region where any sensible detection statistic (one that is not worse than mere guessing) should lie. 



terminated whenever the improvement in logarithmic 
likelihood over the previous iteration fell below 10~^. In 
this example setting, this lead to an average number of 
four EM iterations for each conditional likelihood max- 
imization in both noise scenarios. The eventual maxi- 
mized likelihood then is given by the overall maximum 
over the conditional maxima, and as the detection statis- 
tic we use the maximized likelihood ratio ^rr!^. The 

p(d|0) ^ 

algorithm used was essentially the one described in Ap- 
pendix A 3 d. 



E. Simulation results 

Figure 3 shows resulting detection statistic values 
(maximized likelihood ratios) under the Gaussian and the 
Student-i models both when a signal is injected as well as 
when he data are noise only. The signal injections here 
were all done at the same amplitude relative to the noise 
spectrum (SNR g = 5.257). In general, both detection 
statistics are very similar; the Student-t likelihood ratio 
tends to turn out slightly lower than the Gaussian one, 
in particular in the case of real interferometer noise. 

The question of to what extent these differences af- 
fect the ability to discriminate signals from noise will be 
approached by considering the receiver operating char- 
acteristic (ROC) curves. ROC curves are based on the 
detection statistics' (here: empirical) distributions. Plac- 
ing different detection thresholds on a detection statistic 
yields a corresponding false alarm probability (based on 
the distribution under the noise-only hypothesis) as well 
as a detection probability (based on the distribution un- 
der the particular signal hypothesis). The ROC curve 
illustrates these combinations over varying threshold val- 
ues [42]. 

Figure 4 shows ROC curves for the Gaussian and the 
Student-i filter for both noise cases. In the case of sim- 
ulated Gaussian noise, both detection statistics perform 
almost identically. For real instrument noise on the other 



hand, the Student-t model is able to provide a signif- 
icantly greater detection probability especially at low 
false-alarm probabilities. A remarkable feature of the 
ROC curves for instrumental noise is that for very low 
false alarm probabilities both filters eventually perform 
as poorly as mere guessing. The Student-t filter is able 
to sustain its discriminating power for lower false alarm 
rates, though. This effect is connected to the frequency 
of noise outliers ( "glitches" ) in the data, leading to very 
large detection statistic values even in the absence of 
a signal. Figure 5 shows the corresponding detection 
thresholds as a function of false-alarm probabilities. The 
point where the detection threshold reaches the injected 
signals' SNR is where the corresponding detection proba- 
bility is « 50%. One can see that, due to the heavy-tailed 
distribution of detection statistics in the case of actual in- 
strument noise, the detection threshold necessary for low 
false-alarm probabilities very quickly grows beyond val- 
ues that could obviously be attributed to be due to the 
signal injections considered here; the rate of noise tran- 
sients of "SNR" greater than the injections' SNR exceeds 
the false alarm rate (in a realistic search, some of these 
might actually be vetoed beforehand) . This effect is very 
obvious here also because signal injections were done only 
at a single SNR, but it will of course persist for other SNR 
distributions — assuming other SNR distributions for in- 
jections will affect the detection probability, but not the 
detection threshold, i.e., the detection procedure itself. 



The exact relative performance of both methods of 
course depends on the details of the particular detection 
problem, the kind of signal searched for, the parameter 
space, noise characteristics, data conditioning, and tun- 
ing parameters. The ROC curves shown above are based 
on a particular, artificial signal population, but their gen- 
eral features persist in a number of additional simulations 
not shown here, for a range of degrees-of-freedom set- 
tings, injection SNRs, data from a different instrument, 
and data from a different time period. 



11 



Gaussian noise, Gaussian filter 
Gaussian noise, Student-f filter 
Instrument noise, Gaussian filter 
Instrument noise, Student-f filter 




T \ 1 1 1 1 1 1 — 

0.002 0.010 0.050 0.200 1.000 

P(false alarm) 

FIG. 5. Detection thresholds on the maximized likelihood 
ratio (the detection statistic), corresponding to certain false- 
alarm probabilities. These thresholds are based on the detec- 
tion statistic's distribution in the absence of a signal. The 
horizontal line indicates the injected signals' SNR (see also 
Fig. 4). 



individual Vj parameters for different frequency ranges 
may yield a significant improvement. It may also make 
sense to specify the degrees-of-freedom parameter depen- 
dent on additional information, like e.g. the data quality 
category [45]. 

It will be interesting to study the Studcnt-t filter's per- 
formance in a realistic search for gravitational-wave sig- 
nals, in conjunction with the existing infrastructure (data 
quality flags, additional vetoes, etc.) and in comparison 
with the conventional matched filter [18, 19]. We are 
also investigating the use of the Student-^ model in the 
context of Bayesian model selection [46]. Here it may 
again yield a more robust discriminator for actual sig- 
nals against noise; on the computational side this prob- 
lem is based on integration of the likelihood, rather than 
maximization, and we do not expect a difference in com- 
putational cost between Gaussian and Studcnt-t models. 
We expect the Studcnt-i filtering procedure to be also 
useful in many other signal-processing contexts, wher- 
ever robustness or uncertainty in the power spectrum is 
an issue. 



CONCLUSIONS 
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We introduced a generalization of the matched filter 
that is commonly applied in signal detection problems. 
The Student-t filter is derived as a maximum-likelihood 
detection method that is based on a Student-t distribu- 
tion for the noise, rather than a Gaussian distribution, 
which would again yield the common matched filter in- 
stead. On the technical side, it generalizes a least-squares 
method to an adaptive variety. While a "Gaussian" 
matched filter is certainly appropriate when the assump- 
tion of stationary Gaussian noise and a known spectrum 
is met, there are several ways to motivate the Student-t 
filter as a robust alternative when these assumptions are 
violated: (i) ''theoretically": the Student-t model allows 
for uncertainty in the PSD, heavier-tailed noise and out- 
liers; (ii) ''heuristically" : the resulting adaptive least- 
squares method is less outlier-sensitive; or (iii) ''pragmat- 
ically" : the filter may turn out more effective in practice, 
as in the realistic example shown above. Besides that, 
being a generalization of the (Gaussian) matched filter, 
it should generally be able to perform as well or better. 
The question of course is whether the gain in detection 
efficiency is worth the additional implementation, tuning 
and computational effort. The difference in computa- 
tional cost for deriving both detection statistics suggests 
that a combined, hierarchical search strategy may also 
be worth considering. 

In the example shown above, the Student-t model's 
degrees-of-freedom parameter was treated as a single con- 
stant. In the context of gravitational- wave interferomet- 
ric data, this is an oversimplification; a study of actual 
instrument noise shows that the Fourier-domain data's 
tail behavior clearly depends on the frequency [43, 44]. 
Accounting for this effect in an actual search by fitting 
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APPENDIX 

1. Discrete Fourier transform 

The Fourier transform convention used in this paper is 
specified below; it is defined for a real-valued function h 
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of time t, sampled at N discrete time points, at a sam- 
pling rate of and it maps from 

{h{t) e R : t = 0, At, 2At, ...,{N- 1)AJ (Al) 

to a function of frequency / 

{Hf) e C : / = 0, A^, 2A/, . . . , (iV - 1)A/}, (A2) 

where A/ = and 

N-l 

Mf) = E exp(-27rijAJ) (A3) 

[3]. 



2. Applying the EM algorithm 

a. Preliminaries 

The expectation-maximization (EM) algorithm is re- 
quired for maximizing the Studcnt-t likelihood; sec 
Sec. inc. What is desired is the maximum of the 
marginal likelihood p(d\6), which is equivalent to the 
marginal density p{0\d) when assuming a uniform prior 
distribution on 9. What is required in order to apply 
the EM algorithm are expressions involving the marginal- 
ized cr| parameters, namely, the conditional distribution 
P(ct2|6», d) and the joint density p{e, a'^\d). The EM algo- 
rithm will then iteratively maximize the likelihood func- 
tion by performing alternating "expectation" and "max- 
imization" steps [8, 33]. 

The conditional posterior distribution P((t||6', d) of the 
jth variance parameter cr| for given data and signal se is 

I 



a scaled inverse distribution 



Inv-x + 2 



i.,5i(/,),+4^|d(/,)-Se(/,)| 



(A4) 



3] with probability density function 

^5i(/,)+4^|d(/,)-Se(/,] 



2a] 



(A5) 



f{aj)(x{aj) ' cxp 
[3]. 

The conditional distribution of the data d for given 
variances and signal parameters 9, P{y\9, i?^), is Gaus- 
sian [3], and the variance parameters' prior, P((t^), again 
was Inv-x^ [3]. The joint conditional density of 6 and 
for given data d is given by 

log(p(0,a2|j/)) (X log{p{y\9,a^)xp{9,a^)) (A6) 



i^\difj)-SeUj)\ 
^ 



2<T? 



-E((i + T)iogK) + ^ 



(A7) 



[3]. 



b. The E step 



For the EM algorithm's expectation step, one needs to 
evaluate the conditional posterior expectation 

Ep(^2|e=e„,y) [\og(p[9,a'^\y))] 
= llog{p{9,a^\y))p{a'\9^9,,y)da' (A9) 

as a function of 9 for some given 9q [8]. Here, 



J log{pi9,a^\y))p{a^\9 = 9o,y) da 



''jSi{f,)+i^\d{fj)-st,(fj)\ 



(AlO) 



X , , -(2+^) / '^J Si if, d(J,)-se, (/, ) I 



da] 



E 



4^ 



3 \ 



-(2+"-i) / Si (/, )+4# dU,)-se,, (f,)\ 



exp 



(All) 
(A12) 



(*) 



where 



exp 



^jSi(fj 



23^ 



4A1 

^ N 



d{fj)-seM,)\ 



r, (A13) 
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since the term marked by the asterisk (*) is the density 

^3 Si if J )+4^ I d{f, )-se„ {f,}\\ 



function of an Inv-x [Vj + 2, 
probabihty distribution, so that 

log(p(/3,a2|y))p(a2|/3 = /3o,2/)da2 



oc 



* JV 


\d(fj)-s, 




1 ^ 




+ 1\ N 






D 



3 4At 



T2 -Si (/j ) + 7nT2 ^ I ''(/j ) (/j ) 



(A14) 
^15) 



TABLE I. Matched filter, general implementation. 

norms = (ss,9, Ss,e; ^i) 
normC = (sc,e, Sc,9; 5*1 ) 

for (i = 1, . . . , m) do / / loop over time points: 
for (j — 0, . . . , N/2) do / / time shift the data: 
5: d'j = dj X exp{2-KifjTi) 
end for 

prodS = (Ss,e, d'; S'l) 
prodC = {sc.e, d'; Si) 

1 1 compute log-likelihood ratio / profile likelihood: 
10: maxLLR[i] = (prodS)^/normS + (prodC)^/normC 
end for 

return max(maxLLR) 



c. The M step 

In the EM algorithm's maximization step, the above 
expectation £{6q,8) (A15) needs to be maximized with 
respect to the parameter 9. The parameter value max- 
imizing the expectation then constitutes the next iter- 
ation's "new" 9o value, for which then the expectation 
again is maximized, and so forth [8]. As one can sec 
from expression (A15), maximization of the expectation 
again amounts to minimizing weighted least-squares, as 
in the Gaussian matched filter described above. 



b. The "Gaussian" matched filter: general implementation 

The first algorithm (Table I) is a "naive" matched-filter 
implementation that maximizes the likelihood (-ratio) 
over a given grid of m time points (r). The algorithm 
mainly consists of a loop over time points, where for 
each time point the (conditional) likelihood is maximized 
over amplitude and phase. In order to match signal and 
data for a certain signal arrival time, the data d are time 
shifted against the signal waveforms Sg/c- The eventual 
result is the profile likelihood evaluated at the specified 
time points, the maximum of which then constitutes the 
generalized likelihood ratio detection statistic that is re- 
turned. 



3. Pseudocode matched and Student-t filters 

a. Preliminaries 

This section sketches actual implementations of 
Student-i and (Gaussian) matched filters in comparison. 
In the following, we will use essentially the same con- 
ventions as before; we will be considering a time series d 
of length N, sampled at a sampling interval of At. The 
signal waveform here is assumed to be a linear combina- 
tion of a sine and a cosine component (ss,e, Sc.g), it has 
an associated arrival time parameter, and possibly ad- 
ditional parameters (as in (26)). Additional waveform 
parameters (other than amplitude, phase, and time) are 
then commonly treated by running several matched fil- 
ters corresponding to different values of 9. The general- 
ization to the case of more than two linear signal compo- 
nents should be straightforward. The profile likelihood 
will be evaluated along a discrete grid of time points 
Ti (i = l,...,m), where the special case of = lAj 
and TO = iV is of particular interest. The filter's output 
each time is a single number, the maximized (logarith- 
mic) likelihood ratio of signal vs. no-signal models. We 
will be making use of the inner product / quadratic form 
notation (a, 6; 5) as defined in (19). Implementations of 
the algorithms sketched in Sec. A3c and A3e are also 
provided in [35]. 



c. The "Gaussian" matched filter: efficient implementation 

If the time points to be maximized over are taken to 
be the same as the data time series' points (r^ = iAt, i = 
1, . . . , TO = N), then the matched- filtering procedure may 
be implemented much more efiiciently. The algorithm 
shown in Table II will give identical results to the pre- 
vious, but it is more efiicient as it takes advantage of 
a Fourier transform to essentially maximize over ampli- 



TABLE 11. Matched filter, efficient implementation. 



norms = {ss,e, Ss,e; Si) 
normC = (sc,e, Sc,e; Si) 

for {j = 0, . . . , (N — 1)) do // correlate data and signals: 
corS[j-f 1] = X S:,e,j / Si{f,) 
5: corC[j-hl] = X Km.j / Siifj) 
end for 

// apply Fourier transforms: 
FTS = DFT(corS) 
FTC = DFT(corC) 
10: for (i = 1, . . . , TV) do / / profile likelihood (-ratio): 

maxLLRH = {^)^ {^^^^^^^ + ^^^^^^i^) 
end for 

return max(maxLLR) 
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TABLE III. Student-t filter, general implementation. 

LLO — log(p(ci, 5*1, 1')) // log-likelihood noise-only model 
for (i = 1, . . . , m) do / / loop over time points: 
for (j = 0, . . . , N/2) do / / time shift the data: 
d'j — dj X exp(2TvifjTi) 
5: end for 

/ / EM-iterations: 

k = 1; Allr = 1; LLRprev = 0; 5T = Si 
while (Allr > Amax) and {k < k 

max 

) do 

norms = {sbM, Ss,e; Si) 
10: normC = {sc.e, Sc.e; St) 

prodS = {ss,e, d' \ S*) 

prodC = {sc,e, d'; Si) 

/3s — prodS/normS 

/3c = prodC/normC 
15: fi = d'— (/3sSs,9 + /3cSc,e) // vector of noise residuals 

LLl = log(p(n, 5*1, i^)) // log-likelihood signal model 

LLR — LLl — LLO / / log-likelihood ratio 

Allr ~ LLR — LLRprev 

LLRprev = LLR 
20: for (j = 0, . . . , N/2) do // adapt the spectrum: 

sUf,) = ^Si{f,)-^^,'-^\^,\' 

end for 

k = k + 1 
end while 

25: maxLLR[i] = LLR // profile likelihood (-ratio) 
end for 

return max(maxLLR) 



tudc, phase, and time simultaneously (see also Sec. II D). 
In practice, one may want to restrict the profile likeli- 
hood maximization (line 13) to the subset of sensible time 
shifts that do not "wrap" the signal circularly around 
the data's end points. Instead of a Fourier transform, 
one could also implement an inverse Fourier transform 
and would then also not need to time-reverse the result's 
indices (hne 11). 



TABLE IV. Student-f filter, efficient implementation. 

LLO = log(p(d. Si , i^)) // log-likelihood noise-only model 
/ / EM -iterations: 

fc = 1; Allr = 1; LLRprev = 0; 5"^ = Si 
while (Allr ^ A^ax) and [k < ^max 

) do 

5: // the "plain" matched filter: 
norms = (ss,e, Ss,e; SI) 
normC = (sc,9, Sc,9; Si) 
for (j = 0,...,(iV-l)) do 

corS[i + 1] = d, X K,B,j/Sl{fj) 
10: corC[i + 1] = d, X S^^,^- / Sl{f,) 

end for 

FTS = DFT(corS) 
FTC = DFT(corC) 
for (i = l,...,Af) do 

15: maxLLRM = (#)^ + 
end for 

/ / end of "plain" matched filter. 
/ / Determine best-fitting template, residuals, etc. : 
imax = arg maXj maxLLR[i] 
20: for (j = 0, . . . , N/2) do // time shift the data: 
d'j = dj X exp(27ri/jri^j^j.) 
end for 

prodS = (Sg.e, d' ; SI) 
prodC = (sc,e, d' ; SI) 
25: /3s = prodS/normS 
/3c ~ prodC/normC 

n — d' — (/3sSs,9 + /3cSc,e) // vector of noise residuals 
LLl — \og{p{n, SijV)) // log-likelihood signal model 
LLR — LLl — LLO // log-likelihood ratio 
30: Allr = LLR - LLRprev 
LLRprev = LLR 

for {j — 0, . . . , N/2) do / / adapt the spectrum: 
SUfs) = i;^Si(/,)-f ^2A,|~^|2 

end for 

35: k ^ k + 1 
end while 
return LLR 



d. The Student-t filter: general implementation 

This algorithm (see Table III) again is a "general" 
version of the Student-t filter, analogous to the general 
matched filter (Sec. A 3 b), where the set of time points r 
is not restricted. The EM algorithm here is applied at 
the level of each single amplitude/phase maximization 
conditional on some time shift Ti. The EM component 
requires the specification of a threshold Amax on the im- 
provement in logarithmic maximized likelihood ratio (e.g. 
10^^), and a threshold fcmax on the number of EM itera- 
tions (e.g. 100). The Student-t likelihood function 



p{x, Si,v) oc exp ( - ^ log 



1 

1 + — 



\ ~ |2 



N 
4At 



(see also (29)) only needs to be computed up to a pro- 
portionality constant here, as only the likelihood ratio is 
of eventual interest. 



e. The Student-t filter: efficient implementation 

The Student-t filter also may be implemented more 
efficiently in case the signal arrival times to maximize 
over are taken to be the time points of the original time 
series (r^ = iAt, i = l,...,m = iV, as in Sec. A3c). 
This implementation (Table IV) then requires one to 
move the level at which the EM-algorithm is applied 
from the conditional maximization over amplitude and 
phase to the joint amplitude/phase/timc maximization; 
effectively this implementation iteratively runs several 
matched filters (sec lines 6-16) while adapting the noise 
spectrum in between. It is unclear whether or how the 
level at which the EM-algorithm is applied affects the re- 
sults; as noted in Sec. HID, the likelihood may be multi- 
modal and different implementations might end up with 
differing maximization results, but whether this actually 
poses a problem in practice is not obvious. Computa- 
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dom. Then the random vector 






fohows a bivariate Student-t distribution with a diagonal 
covariance matrix, exactly like the real and imaginary 
components of n(fj) [8]. The root- mean-square figure 
corresponding to the power then may be written as 



\ 



2(72 



C Iv 



(A16) 

where the random variable D, being a ratio of dis- 
tributed random variables that are normalized by their 
respective degrees-of-freedom, follows an F(2, v) distri- 
bution with 2 and i/ degrees of freedom [7]. 



b. Probability density function, etc. 



FIG. 6. Probability density functions of Student-Rayleigh 
distributions for varying degrees of freedom v and fixed 
scale — 1. For v = oo, the distribution corresponds to 
the usual ("Gaussian") Rayleigh distribution. 



tionally, this latter implementation should be much eas- 
ier, though. Another difference to note is that while the 
matched filter allows us to return the profile likelihood 
as a function of time (the SNR time series), only the 
Student-i filter implementation from Sec. A 3 d is able to 
provide this, while the more efficient implementation will 
only return the overall maximum. 



4. The Student-Rayleigh distribution 

a. Relation to the F distribution 

The noise power's probability distribution under the 
Studcnt-t model (see (33), Sec. JVC) may be related 
to Snedecor's F distribution. First, real and imagi- 
nary parts of the jth element of the discretely Fourier- 
transformed vector n follow a multivariate (bivariate) 
Student-t distribution (see Sec. Ill A). Let A and B be 
independent Gaussian random variables with zero mean 
and standard deviation a. Furthermore, let C be a 
distributed random variable with v degrees of free- 



In the Gaussian noise model (see Sec. II B), the 
noise power at the jth frequency bin, |n(/j)|, follows a 
Rayleigh distribution with probability density function 

h{x\cj) = ^ exp(-^), (A17) 



where the scale parameter a is given as c = 




The analogue Student-Rayleigh probability distribu- 
tion in the Student-t noise model (see Sec. Ill A) is de- 
fined through its density function 

fMA<^.^) = 7^fFi2,u){^), (A18) 

where fF(2,u){') is the probability density function of an 
F{2, v) distribution with 2 and v degrees of freedom. 
Similarly, the cumulative distribution function and quan- 
tile function are given by 

FsR(x|a,i.) =Fj.(2,,)(^) and (A19) 
QsR(pk, v) = ^2^^^Q^^ZM, (A20) 

where Fp^2,v){') ^-nd QF(2,iy){') ^^'C the F distribution's 
cumulative distribution function and quantile function. 

Figure 6 illustrates probability density functions of 
Student-Rayleigh probability distributions for varying 
degrees of freedom v. For = oo, the distribution corre- 
sponds to the usual ("Gaussian") Rayleigh distribution. 
Note, in particular, the differing tail behavior (analogous 
to Fig. 1) that is apparent especially in the logarithmic 
plot. 
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